Degree-corrected distribution-free model for community detection in weighted networks

A degree-corrected distribution-free model is proposed for weighted social networks with latent structural information. The model extends the previous distribution-free models by considering variation in node degree to fit real-world weighted networks, and it also extends the classical degree-corrected stochastic block model from un-weighted network to weighted network. We design an algorithm based on the idea of spectral clustering to fit the model. Theoretical framework on consistent estimation for the algorithm is developed under the model. Theoretical results when edge weights are generated from different distributions are analyzed. We also propose a general modularity as an extension of Newman’s modularity from un-weighted network to weighted network. Using experiments with simulated and real-world networks, we show that our method significantly outperforms the uncorrected one, and the general modularity is effective.

Network data analysis is an important research topic in a range of scientific disciplines in recent years, particularly in the biological science, social science, physics and computer science. Many researchers aim at analyzing these networks by developing models, quantitative tools and theoretical framework to have a deeper understanding of the underlying structural information. A problem in network science that is of major interest is "community detection". The Stochastic Blockmodels (SBM) 1 is a classic model to model un-weighted networks for community detection. In SBM, every node in the same community shares the same expectation degree, which is unrealistic for real-world networks since nodes degrees vary in most real-world networks. To overcome this limitation of SBM, the popular model Degree Corrected Stochastic Blockmodels (DCSBM) proposed in 2 considers node heterogeneity to extend SBM by allowing that nodes in the same community can have various expectation degrees. Many community detection methods and theoretical studies have been developed under SBM and DCSBM, to name a few [3][4][5][6][7][8] , and references therein.
However, most works built under SBM and DCSBM require the elements of adjacency matrix of the network to follow Bernoulli distribution, which limits the network to being un-weighted. Modeling and designing methods to quantitatively detecting latent structural information for weighted networks are interesting topics. Recent years, some Weighted Stochastic Blockmodels (WSBM) have been developed for weighted networks, to name a few [9][10][11][12][13][14][15] . However, though these models for weighted networks are attractive, they always require all elements of connectivity matrix to be nonnegative or all elements of adjacency matrix must follow some specific distributions as found in 16 . Furthermore, spectral clustering is widely used to study the structure of networks under SBM and DCSBM, for example [17][18][19][20][21][22] . Another limitation of the above WSBMs is, it is challenging to develop methods by taking the advantage of the spectral clustering idea under these WSBMs for their complex forms or strict constraint on edge distribution. To overcome limitations of these weighted models 16 , proposes a Distribution-Free Models (DFM) which has no requirement on the distribution of adjacency matrix's elements and allows developing methods to fit the model by taking the advantage of spectral clustering. DFM can be seen as a direct extension of SBM, and nodes within the same community under DFM shares same expectation degrees, which is unrealistic for empirical networks with various nodes degrees.
In this paper, we develop a model called Degree-Corrected Distribution-Free Model (DCDFM) as an extension of DFM by considering node heterogeneity. We extend the previous results in the following ways: (a) DCDFM models weighted networks by allowing nodes within the same community to have different expectation degrees. Though the WSBM developed in 12 also considers node heterogeneity, it requires all elements of connectivity matrix to be nonnegative, and fitting it by spectral clustering is challenging. Our DCDFM inherits the advantages of DFM such that it has no constraint on distribution of adjacency matrix, allows connectivity matrix to have negative entries, and allows applying the idea of spectral clustering to fit it. Meanwhile, as an OPEN School of Mathematics, China University of Mining and Technology, Xuzhou 221116, People's Republic of China. email: qinghuan@cumt.edu.cn extension of DFM, similar as the relationship between SBM and DCSBM, nodes within the same community can have different expectation degrees under our DCDFM, and this ensures that DCDFM can model real-world weighted networks in which nodes have various degrees.
(b) To fit DCDFM, an efficient spectral clustering algorithm called nDFA is designed. We build theoretical framework on consistent estimation for the proposed algorithm under DCDFM. Benefited from the distributionfree property of DCDFM, our theoretical results under DCDFM are general. Especially, when DCDFM reduces to DFM, our theoretical results are consistent with those under DFM. When DCDFM degenerates to DCSBM, our results also match classical results under DCSBM. Numerical results of both simulated and real-world networks show the advantage of introducing node heterogeneity to model weighted networks.
(c) To measure performances of different methods on real-world weighted network with unknown information on nodes labels, we propose a general modularity as an extension of classical Newman's modularity 23 . For weighted network in which all edge weights are nonnegative, the general modularity is exactly the Newman's modularity. For weighted network in which some edge weights are negative, the general modularity considers negative edge weights. Numerical results on simulated network generated under DCDFM for different distributions, and empirical un-weighted and weighted networks with known ground-truth nodes labels support the effectiveness of the general modularity. By using two community-oriented topological measures introduced in 24 , we find that the modularity is effective and our nDFA returns reasonable community partition for real-world weighted networks with unknown ground-truth nodes labels.
Notations. We take the following general notations in this paper. For any positive integer m, let

Degree-corrected distribution-free model
Let N be an undirected weighted network with n nodes. Let A be the n × n symmetric adjacency matrix of N , and A(i, j) denotes the weight between node i and node j for all node pairs. Since we consider weighted network, A(i, j) is finite real values, and it can even be negative for i, j ∈ [n] . Throughout this article, we assume that in network N , all nodes belong to Let ℓ be an n × 1 vector such that ℓ(i) = k if node i belongs to community k for i ∈ [n], k ∈ [K] . For convenience, let Z ∈ {0, 1} n×K be the membership matrix such that for i ∈ [n] rank(Z) = K means that each community C k has at least one node for k ∈ [K] . and Z(i, ℓ(i)) = 1, �Z(i, :)� 1 = 1 mean that Z(i, k) = 1 if k = ℓ(i) and Z(i, k) = 0 if k = ℓ(i) , for i ∈ [n], k ∈ [K] , i.e., each node only belongs to one of the K communities.
Let n k = |i ∈ [n] : ℓ(i) = k| be the size of community k for k ∈ [K] . Set n max = max k∈[K] n k , n min = min k∈[K] n k . Let the connectivity matrix P ∈ R K×K satisfy where |P max | = max k,l∈[K] |P(k, l)| . Eq. (3) means that P is a full rank symmetric matrix, and we set the maximum absolute value of P's entries as 1 mainly for convenience. Meanwhile, it should be emphasized that Eq. (3) allows P to have negative elements. Unless specified, K is assumed to be known in this paper.
Let θ be an n × 1 vector such that θ(i) is the node heterogeneity parameter (also known as degree heterogeneity) of node i, for i ∈ [n] . Let be an n × n diagonal matrix whose i-th diagonal element is θ(i) . For convenience, set θ max = max i∈[n] θ(i) , and θ min = min i∈[n] θ(i) . Since all entries of θ are node heterogeneities, we have For arbitrary distribution F , and all pairs of (i, j) with i, j ∈ [n] , our model assumes that A(i, j) are independent random variables generated according to F with expectation Eq. (5) means that we only assume all elements of A are independent random variables and E[A] = ZPZ ′ without any prior knowledge on specific distribution of A(i, j) for i, j ∈ [n] since distribution F can be arbitrary. The rationality of our assumption on the arbitrariness of distribution F comes from the fact that we can generate a random number A(i, j) from distribution F with expectation �(i, j) . So, instead of fixing F to be a special distribution, A is allowed to be generated from any distribution F as long as the block structure in Eq. (5) holds under DCDFM. Definition 1 Call model (1)-(5) the Degree-Corrected Distribution-Free Model (DCDFM), and denote it by DCDFM n (K, P, Z, �).
Lemma 1 says that rows of U * corresponding to nodes of the same clusters are equal. This suggests that applying k-means algorithm on all rows of U * assuming there are K communities exactly returns nodes memberships up to a permutation of nodes labels since U * has K different rows and U * (i, :) = U * (ī, :) if nodes i and ī belong to the same community for i,ī ∈ [n].
The above analysis is under the oracle case when is given under DCDFM, now we turn to the real case where we only have A obtained from the weighted network N and the known number of communities K. Since labels vector ℓ is unknown for the real case, our goal is to use (A, K) to predict it. Let Ã =Ûˆ Û ′ be the leading K eigen-decomposition of A such that Û ∈ R n×K ,ˆ ∈ R K×K ,Û ′Û = I K , and ˆ contains the leading K eigenvalues of A. Let Û * ∈ R n×K be the row-normalized version of Û such that Û * (i, :) =Û (i,:) . The detail of our normalized Distribution-Free Algorithm (nDFA for short) is described in Algorithm 1, and it can be programmed by only a few lines of Matlab codes.
We name our algorithm as nDFA to stress the normalization procedure aiming at cancelling the effect of node heterogeneity and the distribution-free property aiming at modeling weighted networks. Using the idea of normalizing each row of Û to remove the effect of node heterogeneity can also be found in 18,20 . Using the idea of entry-wise ratios between the leading eigenvector and other leading eigenvectors of A proposed in 19 is also possible to remove the effect of , and we leave studies of it under DCDFM for our future work.
Here, we provide the complexity analysis of nDFA. For nDFA's computational complexity, the most expensive step is the eigenvalue decomposition which requires O(n 3 ) times 26 . The row-normalization step costs O(n 2 ) , and the k-means step costs O(nlK 2 ) , where l is the number of k-means iterations, and we set l = 100 for our nDFA in this article. So the overall computational complexity of nDFA is O(n 3 ) . Though it is time demanding when n becomes huge, many wonderful works focus on spectral clustering for un-weighted network community detection, see [17][18][19][20]22,[27][28][29][30][31][32][33][34] . Meanwhile, though using the random-projection and random-sampling ideas developed in 35 to accelerate nDFA is possible, it is out of the scope of this article, and we leave it for our future work.

Consistency of nDFA
To build theoretical guarantee on nDFA's consistency under DCDFM, we need below assumption.

Assumption 1 Assume
The above assumption is mild since it only requires that all elements of A and , and variances of A's entries are finite. We' d emphasize that Assumption 1 has no prior knowledge on any specific distribution of A(i, j) under DCDFM for all nodes, thus it dose not violate the distribution-free property of the proposed model. To build theoretical guarantee on consistent estimation, we need the following assumption.
On the one hand, when all elements of A are nonnegative, Assumption 2 guarantees a lower bound requirement on network sparsity. To have a better understanding on network sparsity, consider the case that F is a distribution such that all entries of A are nonnegative. We have n j=1 A(i, j) is the degree of node i and n j=1 �(i, j) = θ(i) n j=1 θ(j)P(ℓ(i), ℓ(j)) is the expectation degree of node i. Especially, when � = √ ρI n and F is Bernoulli or Poisson or Binomial distribution, we have � = ρZPZ ′ , which gives P(A(i, j) = m) = ρP(ℓ(i), ℓ(j)) for some m > 0 , we see that ρ controls the sparsity of such weighted network or un-weighted network. Meanwhile, the sparsity assumption is common when proving estimation consistency for spectral clustering method, for example, consistency works for un-weighted network community detection like 19,20 . Especially, when F is Bernoulli distribution and � = √ ρI n such that DCDFM reduces to SBM, γ and τ have a upper bound 1, and Assumption 2 turns to require that ρn ≥ log(n) , which is consistent with the sparsity requirement under SBM in 20 , and this guarantees that our requirement on network sparsity matches with classical result when DCDFM degenerates to SBM. On the other hand, for the case that F allows A to have negative entries, θ is not related with network sparsity but only heterogeneity parameter because it is meaningless to define sparsity in an adjacency matrix with negative elements. For this case, Assumption 2 merely controls θ for our theoretical framework. Though γ is assumed to be a finite number, we also consider it in our Assumption 2 due to the fact that γ is directly related with the variance term of A's elements, i.e., γ has a close relationship with the distribution F though F can be arbitrary distribution. After obtaining our main results for nDFA, we will apply some examples to show that τ and γ are always finite or at least can be set as finite numbers to make Assumption 2 hold under different choices of F . Meanwhile, we make Assumptions 1 and 2 mainly for the convenience of theoretical analysis on nDFA's consistent estimation, and this two assumptions are irrelevant to the identifiability of our model DCDFM. Based on the above two assumptions, the following lemma bounds �A − � with an application of Bernstein inequality 36 .
Bound obtained in Lemma 2 is directly related with our main result for nDFA. To measure nDFA's performance theoretically, we apply the clustering error of 22 for its theoretical convenience. Set Ĉ k = {i :l(i) = k} for k ∈ [K] . Define the clustering error as where S K is the set of all permutations of {1, 2, . . . , K} . Actually, using clustering errors in [18][19][20] to measure nDFA's performance also works, and we use f in this paper mainly for its convenience in proofs. The following theorem is the main result for nDFA, and it shows that nDFA enjoys asymptotically consistent estimation under the proposed model. �A − �� = O( γ θ max �θ� 1 log(n)).

Proof
where we have used the fact that K (Z ′ Z) = n min in the last equality. ) . Finally, this theorem follows by Lemma 2.
From Theorem 1, we see that decreasing θ min increases the upper bound of error rate, and this can be understood naturally since a smaller θ min gives a higher probability to generate an isolated node having no connections with other nodes, and thus a harder case for community detection, where such fact is also found in 19 under DCSBM. It is also harder to detect nodes labels for a network generated under a smaller | K (P)| and n min , and such facts are also found in 20 under DCSBM. Add some conditions on model parameters, we have below corollary by basic algebra.
Corollary 1 Under DCDFM n (K, P, Z, �) , and conditions in Theorem 1 hold, with probability at least 1 − o(n −3 ), When θ(i) = √ ρ for i ∈ [n] such that DCDFM reduces to DFM, theoretical results for nDFA under DCDFM are consistent with those under DFM proposed in Theorem 1 of 16 . For the third bullet of Corollary 1, we see that | K (P)| should shrink slower than γ log(n) ρn for consistent estimation, and it should shrink slower than log(n) n when ρ is a constant and γ is finite. When K (P) and γ are fixed, we see that ρ should shrink slower than log(n) n , and this is consistent with assumption 2. Generally speaking, the finiteness of γ is significant for the fact that we can ignore the effect of γ in our theoretical bounds as long as γ is finite. Next, we use some examples under different distributions to show that τ and γ are finite or we can always set them as finite.
Follow similar analysis as 16 , we let F be some specific distributions as examples to show the generality of DCDFM as well as nDFA's consistent estimation under DCDFM. For i, j ∈ [n] , we mainly bound γ to show that γ is finite (i.e., the 2nd bullet in assumption 1 holds under different distributions) and then obtain error rates of nDFA by considering below distributions under DCDFM, where details on probability mass function or www.nature.com/scientificreports/ probability density function on these distributions can be found in http:// www. stat. rice. edu/ ~dobel man/ cours es/ texts/ distr ibuti ons. c &b. pdf.
and all entries of A are finite real numbers. Since mean of Normal distribution can be negative, DCDFM allows P to have negative entries as long as P is full rank. Sure, in this case, γ = max i,j∈[n] is finite, and assumption 2 requires From this bound, we see that increases σ 2 A increases error rate, and a smaller σ 2 A is preferred which is also verified by Experiment 1[b] in Section 5. For convenience, setting σ 2 A ≤ Cθ 2 min for some C > 0 makes assumption 2 equal to require θ max �θ � 1 log(n) → ∞ as n → ∞ since τ is finite and f = O( . m ) for some positive integer m and all entries of A are integers in {0, 1, 2, . . . , m} . Sure, τ ≤ m here. For this case, since by the property of Binomial distribution. Thus, γ = 1 , and error rate in this case can be obtained immediately.
) , all entries of P are nonnegative, and DCDFM reduces to DCSBM considered in literature 19,20 . For this case, all entries of A are either 0 or 1, i.e., un-weighted network and τ ≤ 1 . Since , suggesting that γ = 1 is finite and Assumption 1 holds. Setting γ = 1 in Theorem 1 obtains the theoretical upper bound of nDFA's error rate under DCDFM immediately.

Example 4 when F is Poisson distribution such that
) as in 2 and all entries of A are nonnegative integers. For Poisson distribution, all entries of P should be nonnegative and τ is finite as long as A's elements are generated from Poisson distribution. such that all non-diagonal elements of A are either 1 or −1 . For this case, all entries of P are real values and �(i, j) should be set such that Since our model DCDFM has no limitation on the choice of distribution F as long as Eq. (5) holds, setting F as any other distribution (see, Double exponential, Exponential, Gamma and Uniform distributions in http:// www. stat. rice. edu/ ~dobel man/ cours es/ texts/ distr ibuti ons. c &b. pdf) obeying Eq. (5) is also possible and this guarantees the generality of our model as well as our theoretical results.

Experimental results
Both simulated and empirical data are presented to compare nDFA with existing algorithm DFA developed in 16 for weighted networks, where DFA applies k-means on all rows of Û with K clusters to estimate nodes labels. Meanwhile, codes for all experimental results in this paper are executed by MATLAB R2021b. Though our model DCDFM and our algorithm nDFA are also applicable for network with self-connected nodes, unless specified, we do not consider loops in this part. Before presenting experimental results, we introduce general modularity for weighted network community detection in next subsection, where the general modularity can be seen as a measure of performance for any algorithm designed for weighted network and we will also test the effectiveness of the general modularity in both simulated and empirical networks.
General modularity for weighted networks. Unlike un-weighted network, the node degree in weighted network is slightly different, especially when A has negative elements. For un-weighted network in which all entries of A take values either 0 or 1, and weighted network in which all entries of A are nonnegative, degree for node i is always defined as n j=1 A(i, j) . However, for weighted network in which A has negative entries, n j=1 A(i, j) does not measure the degree of node i. Instead, to measure node degree for all kinds of weighted networks, we define degree www.nature.com/scientificreports/ of node i as below: , ℓ(j))| , we see that θ(i) is a measure of the "degree" of node i, especially when all entries of A are nonnegative. Let . Now, we are ready to define the general modularity Q as below wherê ℓ is the n × 1 vector such that l (i) denotes the cluster that node i belongs to, and comes from the fact that a negative A(i, j) may not mean that nodes i and j tend to be in different cluster since A's negative elements may be generated from Normal distribution, Logistic distribution or some other distributions. We set Q = Q + − Q − empirically since such modularity is a good measure to investigate the performance of different algorithms.
In Eq. (7), we write Q as Q(•) for convenience where • denotes certain community detection method since l is obtained by running the community detection method • to A with K communities. Next, we define the effectiveness of the general modularity Q. Since f is stronger criterion than the Hamming error 19 , for numerical studies, the Hamming error rate defined below is applied to investigate performances of algorithms.
where P K is the set of all K × K permutation matrices, the matrix Ẑ ∈ R n×K is defined as Ẑ (i, k) = 1 if l (i) = k and 0 otherwise for i ∈ [n], k ∈ [K] , and l is the label vector returned from applying method • to A with K communities. Sure, f (•1) <f (•2) means method •1 outperforms method •2 . Now, we are ready to define the effectiveness of Q when f (•1) � =f (•2): where we do not consider the case f (•1) =f (•2) because Q(•2) = Q(•1) for this case and it does not tell the effectiveness of Q. On the one hand, E Q (•1, •2) = 1 means that if method •1 outperforms method •2 (i.e., f (•1) <f (•2) ), then we also have Q(•1) ≥ Q(•2) , i.e., the generality modularity Q is effective. On the other hand, E Q (•1, •2) = −1 means that if method •1 outperforms method •2 , we have Q(•1) < Q(•2) which means that Q is ineffective. For any experiment, suppose we generate N adjacency matrices A under a community detection model, we obtain N numbers of E Q (•1, •2) . The ratio of effectiveness is defined as where N 0 is the number of adjacency matrices such that f (•1) =f (•2) since the effectiveness of the generality modularity is defined when f (•1) � =f (•2) . Sure, a lager R E Q (•1,•2) indicates the effectiveness of the generality modularity Q obtained by applying methods •1 and •2.
Simulations. In numerical simulations, we aim at comparing nDFA with DFA under DCDFM by reporting f (nDFA) and f (DFA) , and investigating the effectiveness of Q by reporting R E Q (DFA,nDFA) computed from nDFA and DFA.
In all simulated data, unless specified, set n = 400, K = 4 , generate ℓ such that node belongs to each community with equal probability, and let ρ > 0 be a parameter such that θ(i) = ρ × rand(1) , where rand(1) is a random value in the interval (0, 1). ρ is regarded as sparsity parameter controlling the sparsity of network N . When P, Z, are set, is = ZPZ ′ . Generate the symmetric adjacency matrix A by letting A(i, j) generated from a distribution F with expectation �(i, j) . Different distributions will be studied in simulations, and we show the error rates of different methods, averaged over 100 random runs for each setting of some model parameters.

Experiment 1: normal distribution.
This experiment studies the case when F is Normal distribution. Set P as  Fig. 1, we plot the error against ρ . For larger ρ , we get denser networks, and the two methods perform better. When ρ is larger than 5, nDFA outperforms DFA. Meanwhile, Experiment 1[a] generates totally 10 × 100 = 1000 adjacency matrices, where 10 is the cardinality of {1, 2, . . . , 10} , and 100 is the repetition for each ρ . Among the 1000 adjacency matrices, we calculate R E Q (DFA,nDFA) based on DFA and nDFA. R E Q (DFA,nDFA) for Experiment 1[a] is reported in Table 1, and we see that R E Q (DFA,nDFA) is 82.59% (a value much larger than 50%), suggesting the effectiveness of the general modularity. Similar illustrations on the calculation of R E Q (DFA,nDFA) hold for other simulated experiments in this paper.   www.nature.com/scientificreports/ first bullet given after Corollary 1. Thus, the increasing error of nDFA when σ 2 A increases is consistent with our theoretical findings. Meanwhile, the numerical results also tell us that nDFA significantly outperforms DFA since DFA always performs poor when there exists node heterogeneity for each node. m is a probability and �(i, j) ≤ ρ , ρ should be set lesser than m. In panel (c) of Fig. 1, we plot the error against ρ . We see that the two methods perform better as ρ increases and nDFA behaves much better than DFA.  Fig. 1, we plot the error against m. For larger m, both two methods perform poorer, and this phenomenon occurs because A(i, j) may take more integers as m increases when F is Binomial distribution. The results also show that nDFA performs much better than DFA when considering variation of node degree.

Remark 7
For visuality, we plot A generated under DFM when F is Binomial distribution. Let n = 24, K = 2 . Let Z(i, 1) = 1 for 1 ≤ i ≤ 12 , Z(i, 2) = 1 for 13 ≤ i ≤ 24 . Let m = 5 , and = 0.7I (i.e., a DFM case). Set P as For above setting, two different adjacency matrices are generated under DFM in Fig. 2 where we also report error rates for DFA and nDFA. Meanwhile, since A and Z are known here, one can run DFA and nDFA directly to A in Fig. 2 with two communities to check the error rates of DFA and nDFA. Furthermore, we also plot adjacency matrices for Bernoulli distribution, Poisson distribution and Signed network. Experiment 3: bernoulli distribution. In this experiment, let F be Bernoulli distribution such that A(i, j) is random variable generated from Bernoulli(�(i, j)) . Set P same as Experiment 2.   Fig. 1, we plot the error against ρ . The results are similar as that of Experiment 2[a], and nDFA enjoys better performance than DFA.

Real data.
In real data analysis, instead of simply using our general modularity for comparative analysis, we also consider the topological comparative evaluation framework proposed in 24 . We only consider two topological approaches embeddedness which measures how much the direct neighbours of a node belong to its own community and community size which is an important characteristic of the community structure 24 , because the internal transitivity, scaled density, average distance and hub dominance introduced in 24 only work for unweighted networks while we will consider weighted networks in this part. Now we provide the definition of embeddedness 24,38 : for node i, let d int (i) = j:l(j)=l(i) A(i, j) be the internal degree of node i belonging to cluster    www.nature.com/scientificreports/ ℓ(i) and d(i) = n j=1 A(i, j) be the total degree of node i, where l is the estimated nodes labels for certain method • . The embeddedness of node i is defined as where this definition of embeddedness extends that of 24,38 from un-weighted network to weighted network whose adjacency matrix is connected and has nonnegative entries. Extending the definition of embeddedness for adjacency matrix in which there may exist negative elements is an interesting problem, and we leave it for our future work. Meanwhile, e(i) is only defined for one node i, to capture embeddedness for all nodes, we define the overall embedbedness (OE for short) depending on method • as As analyzed in 24 , the maximal OE(•) of 1 is reached when all the neighbours are in its community for all nodes (i.e., d int (i) = d(i) for all i). However, if method • puts all nodes (or a majority of nodes) into one community, then it can also make OE(•) equal to 1 (or close to 1). Therefore, simply using the overall embedbedness to compare the performances of different community detection methods is not enough, we need to consider the general modularity Q(•) and community size. Set where τ (•) measures how much the size of the largest estimated cluster to the network size. If τ (•) is 1 (or close to 1), it means that method • puts all nodes (or a majority of nodes) into one community. For real-world networks with known true labels, we let error(•) denotes the error rate of method • . Finally, T(•) denote the run-time of method • . For real-world networks analyzed in this paper, we will report the general modularity Q, the overall embeddedness OE, community size parameter τ , error rate error (for real-world network with known true labels) and run-time T of nDFA and DFA for our comparative analysis.
Real-world un-weighted networks. In this section, four real-world un-weighted networks with known labels are studied to investigate nDFA's empirical performance. Some basic information of the four data are displayed in Table 2, where Karate, Dolphins, Polbooks and Weblogs are short for Zachary's karate club, Dolphin social network, Books about US politics and Political blogs, and the four datasets can be downloaded from http:// wwwperso nal. umich. edu/ ~mejn/ netda ta/. For these real-world un-weighted networks, their true labels are suggested by the original authors, and they are regarded as the "ground truth". Brief introductions of the four networks can be found in 2,18,19,39 , and reference therein. Similar as the real data study part in 16 , since all entries of adjacency matrices of the four real data sets are 1 or 0 (i.e., the original adjacency matrices of the four real data are unweighted), to construct weighted networks, we assume there exists noise such that we have the observed matrix Â at hand where Â = A + W with the noise matrix W ∼ Normal(0, σ 2 W ) , i.e, use Â as input matrix in nDFA and DFA instead of using A. We let σ 2 W range in {0, 0.01, 0.02, . . . , 0.2} . For each σ 2 W , we report error rate of different methods averaged over 50 random runs and aim to study nDFA's behaviors when σ 2 W increase. Note that similar as in the perturbation analysis of 16 , we can add a noise matrix W whose entries have mean 0 and finite variance in our theoretical analysis of nDFA, and we do not consider perturbation analysis during our theoretical study of nDFA for convenience in this paper. We only consider the influence of noise matrix in our numerical study part to reveal the performance stability of our algorithm nDFA. Figure 6 displays the error rates against σ 2 W for the four real-world social networks. When noise matrix W has small variance, nDFA has stable performances. When elements of W varies significantly, nDFA's error rates increases. DFA also has stable performances when σ 2 W is small, except that DFA always performs poor on Dolphins and Weblogs networks even for the case that there is no noise ( σ 2 W = 0 means a case without noise). For the two networks Karate and Polbooks, nDFA has similar performances as DFA and both methods enjoy satisfactory performances. For Dolphins and Weblogs, nDFA performs much better than DFA. Especially, for Weblogs network, DFA's error rates are always around 35% , which is a large error rate, while nDFA's error rates are always lesser than 15% even for a noise matrix with large variance. This can be explained by the fact that the node degree in Weblogs network varies heavily, as analyzed in 2,19 . Since nDFA is designed under DCDFM , τ (•) = max 1≤k≤K |i :l(i) = k| n , www.nature.com/scientificreports/ considering node heterogeneity while DFA is designed under DFM without considering node heterogeneity, naturally, nDFA can enjoy better performances than DFA on real-world networks with variation in node degree. Meanwhile, Q, OE, error, τ and T obtained by applying DFA and nDFA to adjacency matrices A for the above four real-world networks with known nodes labels are reported in Table 3. Combine results in Table 3 and Fig. 6, we see that when nDFA has smaller error rates than DFA, nDFA has larger modularity than DFA, and this suggests the general modularity is effective for un-weighted networks (note that, the general modularity is exactly the Newman's modularity when all entries of A are nonnegative). For Dolphins, Polbooks and Weblogs, we see that both the overall embeddedness and modularity of nDFA are larger than that of DFA, which suggests that nDFA returns more accurate estimation on nodes labels than DFA, and this is consistent with the fact that nDFA has smaller error rates than DFA for these three networks. Compared with nDFA whose error rates are small, τ of DFA for Dolphins and Weblogs are much larger than that of nDFA, which suggests that DFA tends to put nodes into one community. Meanwhile, small error rates, large overall embeddedness (close to 1), and medium size of the largest estimated community of nDFA suggest that these four networks enjoy nice community structure for community detection. Sure, both methods run fast on these four networks.
Real-world weighted networks. In this section, we apply nDFA and DFA to five real-world weighted networks Karate club weighted network (Karate-weighted for short), Gahuku-Gama subtribes network, the Coauthorships in network science network (CoauthorshipsNet for short), Condensed matter collaborations 1999 (Conmat-1999 for short) and Condensed matter collaborations 2003 (Con-mat-2003 for short). For visualization, Fig. 7 shows adjacency matrices of the first two weighted networks. Table 4 summaries basic information for the five networks. Detailed information of the five networks can be found below.
Karate-weighted: This weighted network is collected from a university karate club. In this weighted network, node denotes member, and edge between two nodes indicates the relative strength of the associations. Actually, this network is the weighted version of Karate club network. So, the number of communities is 2 and true labels for all members are known for Karate-weighted. This data can be downloaded from http:// vlado. fmf. uni-lj. si/ pub/ netwo rks/ data/ ucinet/ ucida ta. htm# kazalo.
Gahuku-Gama subtribes: This data is the signed social network of tribes of the Gahuku-Gama alliance structure of the Eastern Central Highlands of New Guinea. This network has 16 tribes, and positive or negative link between two tribes means they are allies or enmities, respectively. Meanwhile, there are 3 communities in this network, and we use nodes labels shown in Fig. 9b from 40 as ground truth. This data can be downloaded from http:// konect. cc/(see also 41 ). Note that since the overall embeddedness is defined for adjacency matrix with nonnegative entries, it is not applicable for this network.
CoauthorshipsNet: This data can be downloaded from http:// www-perso nal. umich. edu/ ~mejn/ netda ta/. In CoauthorshipsNet, node means scientist and weights mean coauthorship, where weights are assigned by the original papers. For this network, there is no ground truth about nodes labels, and the numbers of communities are unknown. The CoauthorshipsNet has 1589 nodes, however its adjacency matrix is disconnected. Among Table 3. The general modularity, overall embeddedness, community size parameter, error rate and run-time of DFA and nDFA for networks in Table 2.  Fig. 8 suggest that the number of communities is 2, where 27 also applies the idea of eigengap to estimate the number of communities for real-world networks. Note that though CoauthorshipsNet1589 is disconnected, we can still apply nDFA and DFA on it since there is no requirement on network connectivity when applying DFA and nDFA. Note that since the overall embeddedness is defined for adjacency matrix that is connected, it is not applicable for CoauthorshipsNet1589. Con-mat-1999: This data can be downloaded from http:// www-perso nal. umich. edu/ ~mejn/ netda ta/. In this network, node denotes scientists and edge weights are provided by the original papers. The largest connected component for this data has 13861 nodes. Figure 8 suggests K = 2 for this data.  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33      www.nature.com/scientificreports/ Con-mat-2003: It is updated network of Con-mat-1999 and the largest connected component has 27519 nodes. Figure 8 suggests K = 2 for Con-mat-2003.
We apply nDFA and DFA on Karate-weighted and Gahuku-Gama subtribes, and find that error rates for both methods on both data are zero, suggesting that nDFA and DFA perform perfect on this two networks. For visualization, Figs. 9, 10 and 11 show community detection results by applying nDFA on these weighted networks except Con-mat-2003 whose size is too large to plot using the graph command of MATLAB. Note that disconnected components and isolated nodes can also be classified by nDFA as shown in panel (a) of Fig. 10, and this guarantees the widely applicability of nDFA since it can deal with disconnected weighted network even with isolated nodes. Table 5 records Q, OE, τ and T for the five weighted networks, and we find that Q(nDFA) is much larger than Q(DFA) for CoauthorshipsNet, Con-mat-1999 and Con-mat-2003, suggesting that nDFA returns more accurate results on community detection than DFA. For CoauthorshipsNet1589, DFA puts 1585 among 1589 nodes into one community, and nDFA puts 1025 among 1589 nodes into one community. Recall that Q(nDFA) is much larger than Q(DFA) for CoauthorshipsNet1589, we see that DFA performs poor by tending to put nodes into one community while nDFA performs nice for returning a reasonable community structure. For CoauthorshipsNet379, though the overall embeddedness of nDFA is larger than DFA, nDFA's τ is much smaller than DFA, which suggests that nDFA returns more reasonable community partition for CoauthorshipsNet379 than DFA since DFA puts almost all nodes into one community. For Con-mat-1999 and Con-mat-2003, though OE(DFA) is larger than OE(nDFA), DFA again puts almost all nodes into one community for its large τ . For runtime, we see that nDFA processes real-world weighted networks of up to 28000 nodes within tens of seconds. Generally, we see that nDFA returns larger general modularity, smaller τ than that of DFA, suggesting nDFA provides more reasonable community partition. For comparative evaluation, simply using the overall embeddedness OE is not enough, and we should combine OE and τ for comparative analysis. Method returns larger OE and smaller τ returns more reasonable community division, while method with larger general modularity always enjoys larger OE and smaller τ , i.e., Q functions similar as larger OE and smaller τ when a method gives reasonable community partition, just as how our nDFA performs on all real-world networks used in this paper. And this supports the effectiveness of our general modularity. Table 5. Q, OE, τ and T of DFA and nDFA for networks in Table 4.

Conclusion
In this paper, we introduced the Degree-Corrected Distribution-Free Model (DCDFM), a model for community detection on weighted networks. The proposed model is an extension of previous Distribution-Free Models by incorporating node heterogeneity to model real-world weighted networks in which nodes degrees vary, and it also extends the classical degree-corrected stochastic blockmodels to weighted networks by allowing connectivity matrix to have negative elements and allowing elements of adjacency matrix A generated from arbitrary distribution as long as the expectation adjacency matrix enjoys the block structure in Eq. (5). We develop an efficient spectral algorithm for estimating nodes labels under DCDFM by applying k-means algorithm on all rows in the normalized eigenvectors of the adjacency matrix. Theoretical results obtained by delicate spectral analysis guarantee that the algorithm is asymptotically consistent. The distribution-free property of our model allows that we can analyze the behaviors of our algorithm when F is set as different distributions. When DCDFM degenerates to DFM or DCSBM, our theoretical results match those under DFM or DCSBM. Numerical results of both simulated and empirical weighted networks demonstrate the advantage of our method designed by considering the effect of node heterogeneities. Meanwhile, to compare performances of different methods on weighted networks with unknown information on nodes communities, we proposed the general modularity as an extension of Newman's modularity. Results of simulated weighted networks and real-world un-weighted networks suggest the effectiveness of the general modularity. The tools developed in this paper can be widely  www.nature.com/scientificreports/ applied to study the latent structural information of both weighted networks and un-weighted networks. Another benefit of DCDFM is the potential for simulating weighted networks under different distributions. Furthermore, there are many dimensions where we can extend our current work. For example, K is assumed to be known in this paper. However, for most real-world weighted networks, K is unknown. Thus, estimating K is an interesting topic. Some possible techniques applied to estimate K can be found in [46][47][48] . Similar as 4 , studying the influence of outlier nodes theoretically for weighted networks is an interesting problem. Developing method for weighted network's community detection problem based on modularity maximization under DCDFM similar as studied in 6 is also interesting. Meanwhile, spectral algorithms accelerated by the ideas of random-projection and randomsampling developed in 35 can be applied to handle with large-scale networks, and we can take the advantage of the random-projection and random-sampling ideas directly to weighted network community detection under DCDFM. We leave studies of these problems for our future work.

Data availability
All data and codes that support the findings of this study are available from the corresponding author upon reasonable request.